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Abstract 

Learning algorithms need generally the possibility to compare several streams of in- 
formation. Neural learning architectures hence need a unit, a comparator, able to com- 
pare several inputs encoding either internal or external information, like for instance 
predictions and sensory readings. Without the possibility of comparing the values of 
prediction to actual sensory inputs, reward evaluation and supervised learning would 
not be possible. 

Comparators are usually not implemented explicitly, necessary comparisons are com- 
monly performed by directly comparing one-to-one the respective activities. This im- 
plies that the characteristics of the two input streams (like size and encoding) must be 
provided at the time of designing the system. 

It is however plausible that biological comparators emerge from self-organizing, genet- 
ically encoded principles, which allow the system to adapt to the changes in the input 
and in the organism. We propose an unsupervised neural circuitry, where the function 
of input comparison emerges via self -organization only from the interaction of the sys- 
tem with the respective inputs, without external influence or supervision. 
The proposed neural comparator adapts, unsupervised, according to the correlations 



present in the input streams. The system consists of a multilayer feed-forward neural 
network which follows a local output minimization (anti-Hebbian) rule for adaptation 
of the synaptic weights. 

The local output minimization allows the circuit to autonomously acquire the capabil- 
ity of comparing the neural activities received from different neural populations, which 
may differ in the size of the population and in the neural encoding used. The compara- 
tor is able to compare objects never encountered before in the sensory input streams and 
to evaluate a measure of their similarity, even when differently encoded. 



1 Introduction 

In order to develop a complex targeted behavior, an autonomous agent must be able to 
relate and compare information received from the environment with internally gener- 
ated information (see |Billiiig . 2010p . For example it is often necessary to decide whether 



the visual image currently being perceived is similar to an image encoded in some form 
in memory. 

For artificial agents, such basic comparison capabilities are typically either hard- 
coded or initially taught, both processes involving the inclusion of predefined knowl- 
edge (for instance Bovet and Pfeifer[ 2005a|b I. However, living organisms must acquire 



this capability autonomously, only via interaction with the acquired data, possibly with- 
out any exphcit feedback from the environment ( jO'ReiUy and Munakata[ 2000 ). We can 



therefore hypothesize the presence of a neural circuitry in living organisms, capable of 
comparing the information received by different populations of neurons. It cannot there- 
fore in general be assumed that these populations have a similar configuration, hold the 
information in the same encoding or even manage the same type of information. 

A system encompassing said characteristics must be based on some form of un- 
supervised learning, it must self-organize in order to autonomously acquire its basic 
functionality. The task of an unsupervised learning system is to elucidate the structure 
in the input data, without using external feedback. Thus all the information should be 
inferred from the correlations found in the input and in its own response to the input 
data stream. 

Unsupervised learning can be achieved using neural networks, and has been imple- 
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mented previously for a range of applications (see for instance Sanger[ 1989; |Atiya[ 
T99U[ |Likhovidov[ p97| |Furao et al.\ [2007| |Tong et al.j |2008| ). Higher accuracy is 



generally expected from supervised algorithms. However, |Japkowicz et al.| ( [r995| ); |Jap- 



kowicz ( |2001 ) have shown that for the problem of binary classification, unsupervised 



learning in a neural network can perform better than standard supervised approaches in 
certain domains. 

Neural and non-biologically inspired algorithms are often expressed in mathemati- 
cal terms based on vectors and their respective elementary operations like vector sub- 
traction, conjunction and disjunction. Implementations in terms of artificial neural net- 
works hence typically involve the application of these operations to the output of groups 
of neurons. These basic operations are however not directly available in biological 
neural circuitry, which is based exclusively on local operations. Connections between 
groups of neurons evolve during the growth of the biological agent and may induce 



the formation of topological maps (see Kohonen, 1990) but generically do not result in 
one-to-one neural operations. For instance these one-to-one neural interactions would 
involve an additional global summation of the result for the case of a scalar product. 

It is unclear whether operations like vector operations are directly used by biolog- 
ical systems, their implementation should be in any case robust to the changes in the 
development of the system and to its adaptation to different types of input. In effect, 
the basic building blocks of most known learning algorithms are the mathematical func- 
tions computers are based on. These are, however, not necessarily present, convenient 
or viable in a biological system. It is our aim to elucidate how a basic mathematical 
function can emerge naturally in a biological system. We present, for this purpose, a 
model of how the basic function of comparison can emerge in an unsupervised neural 
network, based on local rules for adaption and learning. Our adaptive "comparator" 
neural circuit is capable of self-organized adaption, with the correlations present in the 
data input stream as the only basis for inference. 

The circuit autonomously acquires the capability of comparing the information re- 
ceived from different neural populations, which may differ in size and in the encoding 
used. The comparator proposed is based on a multilayer feed-forward neural network, 
where the input layer receives two signals y and z, see fig. [T] These two input streams 
can be either unrelated, selected randomly or, with a probability, encode the same in- 
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Figure 1 : Architecture of the proposed neural comparator, with intermediate layers x*^^^ 
and x*^'^) and output x'^^^ The comparator autonomously learns to compare the inputs y 
and z. The output is close to zero when the two inputs are identical or correlated and 
close to unity when they are different or uncorrelated. 

formation. The task of the neural comparator is then to determine, for any pair of input 
signals y and z, whether they are semantically related or not. Any given pair (y,z) 
of semantically related inputs is presented to the system, in general, only one single 
time. The system has hence to master the task of discriminating generically between 
related and unrelated pairs of inputs, and not the task to extract statistically repeatedly 
occurring patterns. 

The strength of the synapses connecting neurons are readjusted using anti-Hebbian 
rules. Due to the readjustment of the synaptic weights, the network minimizes its output 
without the help of external supervision. As a consequence, the network is able to 
autonomously learn to discriminate whether the two inputs encode the same information 
or not, independently of whether the particular input configuration has been encountered 
before or not. The system will respond with a large output activity whenever the input 
pair (y,z) is semantically unrelated and with inactivity for related pairs. 
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1.1 Motivation and expected use case 

We are motivated by a system where the information stored in two different neuronal 
populations are to be compared. In particular, we are interested in systems as the one 



presented by Bovet and Pfeifer (2005a), where two streams of information (for instance, 
visual input and the desired state of the visual input, or the signal from the whiskers of 
a robot compared to the time-delayed state of these sensors) encoded in two separate 
neuronal populations are to be compared, in this particular case in order to get a distance 
vector between the two. In a fixed artificial system, one could obtain this difference by 
simply subtracting the input from each of the streams, provided that the two neuronal 
populations are equal and encode the information in the same way. This subtraction can 
also be implemented in such a system in a neuromorphic way simply by implementing 
a distance function in a neural network. However, we are interested in the case where 
both neuron populations have evolved mostly independently, such that they might be 
structurally different and might encode the information in a different way, which is ex- 
pected in a biological system. Under these conditions, the neuronal circuit comparing 
both streams should be able to invert the encoding of both inputs in order to compare 
them, a task which could not be solved using a fixed distance function. In addition, 
we expect that such a system would be deployed in an environment where it is more 
probable to have different, semantically unrelated, inputs than otherwise. The com- 
parator should hence be able to solve the demanding task of autonomously extracting 
semantically related pairs of inputs out of a majority of unrelated and random input 
patterns. 



2 Architecture of the neural comparator 

The neural comparator proposed consists of a feed-forward network of three layers, 
plus an extra layer filtering the maximum output from the third layer, compare fig. [T| 
We will refer to the layers as A; = 1, 2, 3, 4, where k = 1 corresponds to the input layer 
and /c = 4 to the output layer. The output of the individual neurons is denoted by x[''\ 
where the supraindex refers to the layer and the subindex to the index of the neuron in 
the layer, for instance ^2^^ being the output of the second neuron in the input layer. 

(k) 

The individual layers are connected via synaptic weights Wj^ . In this notation, the 
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index i corresponds to the index of the presynaptic neuron in layer k, and j corresponds 
to the index of the postsynaptic neuron in layer A; + 1. Thus wl^l is the synaptic weight 
connecting the fourth input neuron with the third neuron in the second layer. 

The layers are generally not fully interconnected. The probability of a connection 
between a neuron in layer k and a neuron in layer A; + 1 is pconn- The values used for 
the interconnection probabilities pmlm are given in table [l] 

In the implementation proposed and discussed here, the output layer is special in that 
it consists only in selecting the maximum of all activities from the third layer. There 
are simple neural architectures based on local operations that could fulfill this purpose. 
However, for simplicity, the task of selecting the maximum activity of the third layer is 
done here directly by a single unit. 

2.1 Input protocol 

The input vectors x*^^) consist of two parts 

x('^ = (y,z) , (1) 

where y and z are the two distinct input streams to be compared. We used the following 
protocol for selecting pairs of (y, z): 

• y is selected randomly at each time step, with the elements yi E [0, 1] drawn from 
a uniform distribution. 

• z is selected via 

random with probability 1 — p^q 

(2) 

f (y) with probability peq 

If the inputs z and y carry the same information, they are related via z = f (y), where 
f is generically an injective transformation. This relation reduces to z = y for the case 
when the encodings in the two neural populations y and z are the identity. 

We consider two kinds of encoding, direct encoding with f being the identity, and 
encoding through a linear transformation, which we refer as "linear encoding", 

z = y or z = iy , (3) 



6 



where A is a random matrix. The encoding is maintained throughout individual sim- 
ulations of the comparator. For the case of linear encoding, the matrix A is selected 
initially and not modified during a single run. 

The procedure we used to generate the matrix A consists of choosing each element 
of the matrix as a random number taken from a continuous flat distribution of values 
between -1 and 1. The matrix is then normalized such that the elements of vector z 
belong to Zi = [—1,1]. 

2.2 Synaptic weights readjustment: anti-Hebbian rule 

Each neuron integrates its inputs, via 

_^(fc+i) _ ^ , g{x) = tanh(a;x) , (4) 

with g{x) being the transfer function, a the gain and Wj'^^ the afferent synaptic weights. 
After the information is passed forward, the synaptic weights are corrected using an 
anti-Hebbian rule. 



(5) 



Neurons under an anti-Hebbian learning rule will modify their synaptic weights in or- 
der to minimize their output. Note, that anti-Hebbian adaption rules generically result 
from information maximization principles ( Bell and Sejnowski] 1995^ . Information 
maximization favors spread-out output activities for statistically independent inputs 



(Markovic and Gros 2010), allowing such to filter-out correlated input pairs (y, z) 



which tend to induce a low level of output activities. 

The incoming synaptic weights to neuron i of the {k + l)th layer are additionally 
normalized, after an update of the synaptic weights. 



(6) 



'E,K-?(t+i)r 

The algorithm proposed here is based on the idea that correlated inputs will lead 
to a small output, as a consequence of the anti-Hebbian adaption rule. Uncorrelated 



pairs of input (y, z) will on the other side generate, in general, a substantial output, 
as they correspond to random inputs for which the synaptic weights are not adapted to 
minimize the output. It is worthwhile to remark that using a Hebbian adaption rule and 
classifying the minimum values as uncorrelated would not achieve the same accuracy 
as with the proposed anti-Hebbian rule with output values between - 1 and 1 . The reason 
is that we seek a comparator capable of comparing arbitrary pairs (y, z) of input, and 
not specific examples. 

When using an anti-Hebbian rule, zero output is an optimum for any correlated 
input. In the case of input with equal encoding, this is reached when the synaptic 
weights cancel exactly (wiejt = —Wright) in the first layer, compare fig.[T] In contrast, 
if a Hebbian rule would be used, the optimum value for correlated input corresponds 
to the synaptic weights of correlated input being as large as possible. The consequence 
is that all synaptic weights tend to increase constantly, eventually leading to all output 
achieving maximum values. 

There remains, for anti-Hebbian adaption rules, a statistically finite probability for 
uncorrelated inputs to have a low output by mere chance, viz the terms wf^^xf^ originat- 
ing from y and z may cancel out. In such cases the comparator would be misclassifying 
the input. The occurrence of misclassification is reduced substantially by having multi- 
ple neurons in the third layer. 

By selecting an inter-layer connection probability pconn well below unity, the indi- 
vidual neurons in the third layer will have access to different components of the infor- 
mation encoded in the second layer. This setup is effectively equivalent to generating 
different and independent parallel paths for the information transfer, adding robustness 
to the learning process, since only strong correlations between the input pairs (y,z), 
shared by the majority of paths, are then acquired by all neurons. 

In addition to diminishing the possibility of random misclassification due to the 
multiple paths, the use of anti-Hebbian learning in the third layer minimizes the inci- 

(2) 

dence of the individual parallel paths which consistently result in x • outputs that are 
far larger than the rest (failing paths, since they are unable to learn some correlations). 
Thus adding this layer results in an significant increase in accuracy with respect to an 
individual 2-layer comparator. The accuracy is further improved by adding a filtering 
layer for input classification. 



8 



2.3 Input classification 

Each third-layer neuron encodes the estimated degree of correlation within the input 
pairs, (y, z). The fourth layer selects the most active third-layer neuron, 

x^^) = max|a;[^^| . (7) 
j 

By selecting the maximum of all outputs in the third layer, the circuit looks for a 
"consensus" among the neurons in the third layer. A given input pair (y,z) needs to be 
considered as correlated by all third-layer neurons in order to be classified as correlated 
by the fourth layer. This, together with the randomness of the inter-layer connections, 
increases the robustness of the classification process. 

There are several options for evaluating the effectiveness of the neural comparator. 
We will later discuss, in sect.|4[ an analysis in terms of fuzzy logic, and consider now a 
measure of the accuracy of the system in terms of binary classification. The inputs y and 
z are classified according to the strength of the value of x^^^ . For binary classification we 
use a simple threshold criterion. The inputs y and z are considered to be uncorrected 
if 

x(^) > 9 , 

and otherwise correlated. In this work, the value for the threshold 9 is determined by 
minimizing the probability of misclassification, in order to test the possible accuracy of 
the system. The same effect of this minimization could be achieved by keeping 9 fixed 
and optimizing the slope a of the transfer function (|4]), since 9 depends on the slope 
a. These parameters, the slope a or the discrimination threshold 9, may in principle 
be optimized autonomously using information theoretical objective functions ( |Triesch[ 



2005; Markovic and Gros, 2010). For simplicity we here perform the optimization 
directly. We will show in sec. 3.4, that the optimal values for a and 9 depend essentially 
only on the size N of the input. Minor adjustments of the parameters might anyway 
be desirable to maintain an optimal accuracy. In any case, these readjustments can 



be done in a biological system via intrinsic plasticity (see Stemmler and Koch , 1999 



Mozzachiodi and Byrne ,2010 Markovic and Gros , 201 1 1 



Although we did not implement the max function present in (|7]) in a neuromorphic 
form, a small neuronal circuit implementing ^ could for instance be realized as a 



9 



winner-takes-all network ( [Coultrip et al4 11992^ |Kaski and Kohonen] , |1994[ [Carpenter 



and Grossberg[ 1987[ ). Alternatively, a filtering rule different from the max function 



could be used for the last layer, for instance the addition or averaging of all the inputs. 
We present as supporting information some results showing the behavior of the output 
when using averaging and sum as alternative filtering rules for the output layer. Our 
best results were however found by implementing the last layer as a max function. In 
this work we will discuss the behavior of the system using the max function as the last 
layer. 

We would like to remark that defining a threshold is one way of using this system 
for binary classification, which we use for reporting the possible accuracy of the system. 
However, it is not a defining part of the model. We expect the system to be more useful 
for obtaining a continuous variable measuring the grade of correlation of the inputs. 
As we discuss in sec. |4| this property can be used to apply fuzzy logic in a biological 
system. 

3 Performance in terms of binary classification 



3.1 Performance measures 

In order to access the performance, in terms of binary classification, of the neural com- 
parator, we need to track the numbers of correct and incorrect classifications. We use 
the following three measures for classification errors: 

FP false positives: The fraction of cases for which x'^^^ < 6 (input is classified as 
correlated) occurs for uncorrelated pairs of input vectors y and z: 

„ „ # erroneously assigned as positive 

FP = „ . -r^ . (8) 

^assigned as positive 

FN false negatives: The fraction of cases with output activity x^'^^ > 9 (input classi- 
fied as uncorrelated) occurring for correlated pairs of input vectors, z = f (y): 

# erroneously assigned as negative 
^assigned as negative 
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E overall error : The total fraction of errors E is the fraction of overall wrong clas- 
sifications: 

# erroneously assigned 

E = z — ; . (10) 



^all assignments 

All three performance measures, E, FP and FN, need to be kept low. This is achieved 
for a classification threshold 9 which minimizes {FP + FN). This condition keeps all 
three error measures, FP, FN and E close to their minimum, while giving FN and 
FP equal importance at the same time. 
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Figure 2: The relative mutual information MI% ( 12 1, as a function of the amount of cor- 
relation a between the input and the output, defined by eq.[T3j The comparator achieves 
an MI% of about 50% and 80% respectively for fractions of = 0.2 (dashed line) and 
Peg = 0.8 (solid line) of correlated input pairs, corresponding to 0.12/0.16 = 0.75 and 
0.23/0.25 = 0.92 of all input-output correlations a. Hence only 25% and 8%, respec- 
tively (darker shaded areas), of all correlations are not recovered by the comparator. 



3.1 Mutual Information 

Since the percentage of erroneous classifications, despite it being an intuitive measure, 
is dependent on the relative number of correlated and uncorrelated inputs presented to 
the system, we also evaluate the mutual information (MI) (see for instance Gros[|20()8 ) 



as a measure of the information that has been gained by the classification made by the 
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comparator: 

Ml{X,Y) = H[X] - H[X\Y] (11) 
= {p{x)) + p{y) p{x\y) log {p{x\y)) , 

where X in this case represents whether the inputs are equal or not, and Y is whether 
the comparator classified the input as correlated or not, therefore, both X and Y are 
vectors of size two {true/ false corresponding to semantically related/uncorrelated). 
Here p{x\y) is the conditional probability that the input had been x = f rue/ false given 
that output of the comparator is y = true/ false and H{X) the marginal information 
entropy. 



We will refer specifically to the mutual information (11) between the binary input 



and output of the neural comparator, also known in this context as information gain. The 



mutual information (11) can also be written as yP{x,y)\og{p{x,y)/{p{x)p{y))), 
where p{x,y) = p{y\y)p{y) is the joint probability. It is symmetric in its arguments 
X and Y and positive definite. It vanishes for uncorrelated processes X and Y, viz 
when p{x\y) = p{x), viz for a random output of the comparator. Finally, the mutual 
information is maximal when the two processes are 100% correlated, that is, when the 
off-diagonal probability vanish, p{x\y) = for x ^ y. In this case the two marginal 
distributions p{x) = J2yPi^^y) P{y) = '^IxPi^^v) coincide and MI{X,Y) is 
identical to coinciding marginal entropies, H{X) = H{Y). 

We will present the mutual information as the percentage of the maximally achiev- 
able mutual information, 

M,%(A-,n ^ (,2) 

^ _ Eygyp(?/) T.x&xPi.Ay) log i.p{Ay)) 

which has hence a value between and 1, and is therefore more intuitive to read 
as a percentage of the maximum theoretically possible. The maximum mutual in- 
formation achievable by the system depends solely on the probabilities of correla- 
tion/decorrelation, i.e. peq- 

The statistical correlations between the input X and the output Y can be parametri- 
zed using a correlation parameter a, via 

P{x,y) = p{x)p{y) +Ca{x,y), (13) 
12 
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Table 1: Parameters used in the simulations. A^: Input vector size, A^*^*^^: size of layer k, 
tmax'- number of steps of simulation, pe^: probability of equal inputs, pionn: probability 
of connection from the kih. layer to the {k + l)th layer, a: sigmoid slope, 77: learning 
rate. 

where Ca{x, y) are the element values of the matrix: 



Ca 



a —a 
—a a 



e [0,Peqil-Peq 



(14) 



Here p^q is the probability of having correlated pairs of inputs, viz p(x = true) = Peq 
and p{x = false) = (1 — peq). Using this parametrization allows us to evaluate the 



relative mutual information ( 12 1 generically for a correlated joint probability p(x, y), as 
illustrated in fig. [2j The parametrization ( 13 1 hence provides an absolute yardstick for 
the performance of the neural comparator. 



3.2 Simulation results 

We performed a series of simulations using the network parameters listed in table [T] 
and for two encoding rules ([2]), direct and linear encoding. The lengths A^ of the input 
vectors y and z are taken to be equal, if not stated otherwise. 

3.1 Low Probability of Equals 

Since our initial motivation for the design of this system is the comparison of two input 
streams that are presumably most of the time different, we have studied the behavior of 
the system when there is a lower probability of an event where both streams are equal 
than otherwise. We used p^q = 0.2 in viz in 20% of the cases the relation z = f (y) 
holds and in the remaining 80% the two inputs y and z are completely uncorrelated 
(randomly drawn). Each calculation consists of tmax = 10^ steps, from which the last 
10% of the simulation is used for the evaluation of the performance. During said last 
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10% of the simulation the system keeps learning, i.e. there is no separation between 
training and production stages. The purpose of taking only the last portion is to ignore 
the initial phase of the learning process, since at that stage the output does not provide 
a good representation of the system's accuracy. 

In table [2} we present the mean values for the different measures of error, eqs. ([8])- 



(10), observed for 100 independent simulations of the system. For each individual 

(k) 

simulation, the interlay er connections are randomly drawn with probabilities pconn, the 
parameters are as shown in table [1} The errors for each run are calculated using a 
threshold 9 that minimizes the sum of errors {FP) + {FN) . Each input in the first layer 
has a uniform distribution of values between - 1 and 1 . The accuracy of the comparator 
is generally above 90%, in terms of binary classification errors. There is, importantly, 
no appreciable difference in the accuracy when using direct encoding or linear encoding 
with random matrices. 



Note, that a relative mutual information of MI%^50% is substantial (Guo et al. 



2005 ). A relative mutual information of 50% means that the correlation between the in- 



put and the output of the neural comparator encompasses 75% of the maximally achiev- 
able correlations, as illustrated in fig.[2j 

We found that the performance of the comparator depends substantially on the de- 
gree of similarity of the two input signals y and z for the case when the two inputs are 
uncorrelated. For a quantitative evaluation of this dependency we define the Euclidean 
distance 

d = \\y - r\z)\\ , (15) 

where 1 1 ■ 1 1 denotes the Euclidean norm of a vector. For small input sizes A^, a substan- 
tial fraction of the input vectors are relatively similar with small Euclidean distance d, 
resulting in a small output x^'^\ This can prevent the comparator from learning the clas- 
sification effectively, thus the best accuracy is obtained for input vectors of size greater 
than = 10, compare table [2j 

The above phenomenon can be investigated systematically by considering two dis- 
tinct distributions for the Euclidean distance d. Within our input protocol ([2]) the pairs y 
and z are statistically independent with probability (1 — p^q). We have considered two 
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10.5% 


13.2% 
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8.3% 


14.8% 


23.8% 


15 


6.0% 


1.2% 


6.8% 


44.4% 


5.2% 


2.8% 


5.9% 


41.7% 


30 


5.3% 


1.0% 


6.0% 


49.5% 


4.8% 


1.3% 


5.4% 


50.8% 


60 


6.6% 


0.6% 


7.4% 


45.3% 


4.3% 


1.0% 


5.0% 


54.7% 


100 


6.5% 


0.5% 


7.5% 


45.5% 


5.3% 


0.6% 


6.1% 


51.5% 


200 


7.8% 


0.9% 


8.8% 


37.3% 


6.2% 


0.9% 


7.2% 


44.2% 


400 


7.1% 


0.8% 


7.5% 


43.5% 


7.2% 


0.7% 


8.1% 


41.5% 


600 










6.7% 


0.5% 


7.0% 


50.8% 



Table 2: Errors obtained from averaging 100 runs of the comparator, using direct and 
linear encoding after 10 steps, for different input sizes A^. The connection probabilities 
used are p2mn = 0.3, pmnn = 0.8. For > 5 the standard deviations amounts to 0.1- 
0.8% (decreasing with A^) for the errors E, FP and FN, and 1% for MI%. For the = 5 
case, the standard deviation of the errors is 5-14% (again, decreasing with A^) while for 
MI% it amounts to 15%. 



ways of generating statistically unrelated input pairs. 

Unconstrained: The components yi and Zi are selected ran- 

(16) 

domly from the interval [—1, 1]. 

and 

Constrained: The components yi and Zi are selected ran- 
domly such that the distance d has a flat (uniform) distribu- (17) 
tion in [0, 1]. 

For the case of the 'unconstrained' input protocol the distribution of distances d is 
sharply peaked for large input size A^, compare fig. |3] The impact of the distribution 
of Euclidean distances between the random input vectors y and z is presented in fig. [3| 
where we show the result of three separated simulations: 



a) Using the unconstrained input protocol ( 16 1 for both training and for testing. The 
corresponding performance errors are FP = 1.0%, FN = 10.7% and E = 9.7%, 
for a threshold 61 = 0.34. 



b) Using the unconstrained input protocol ( 16 1 for training and the constrained ( 17 ) 
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protocol for testing. The performance errors are FP = 13.7%, FN = 14.6% and 
E = 14.2%, for a threshold 9 = 0.28. 



c) Using the constrained input protocol ( 17 ) for both training and testing. The corre 



sponding errors are FP = 79.9%, FN = 0.0% and E = 79.9%, for a threshold 
9 = 0.02. 

The accuracy of the comparator is very good for a). In this case values close to c? ~ 
are almost inexistent for random input pairs y and z, random and related input pairs are 
clustered in distinct parts of the phase space. 

The performance of the comparator drops, on the other side, with increasing number 
of similar random input pairs. For the case c) the distribution of distances d is uniform 
and the comparator has essentially no comparison capabilities. Since the 20% of the 
input is correlated, the minimal error E in this case is obtained if the system assumes 
all input to be uncorrelated (i.e. setting an extremely small threshold). That situation 
results in 80% FP and 20% FN. Notice that in this case the mutual information 
of the system is null. Lastly, in the mixed case b) the comparator is trained with a 
unconstrained distribution for the distances d and tested using a constrained distribution. 
In this case the comparator still acquires a reasonable accuracy of E = 14%. 

3.2 Equilibrated Input, Peq = 0-5 

In this subsection we expand the results for equilibrated input data sets, viz = 0.5 in 
(|2]). The procedure remains as described in the previous section. Again, each calculation 
consists of tmax = 10"^ steps, from which the last 10% of the simulation is used for 
performance evaluation. This result is consistent with the intuitive notion, that it is 
substantially harder to learn when y and z are related, when most of the input stream 
is just random noise and semantically correlated input pairs occur only seldom. For 
applications one may hence consider a training phase with a high frequency peq of 
semantically correlated input pairs. 

As seen in table|3| the use of a balanced input set does not change the general behav- 
ior but results in a substantial increase in performance. The accuracy of the system in 
terms of percentage of correct classifications (above 95% accuracy except on very small 
input size) and relative mutual information MI% (~ 80% of the maximum information 
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Figure 3: The probability of an output x*^^^ ^ to occur as a function of Euclidean 
distance d ( [T5| ) of the input pairs y and z (see fig. [TJ, as a density plot (color coded), 
using = 400, a = 1 and direct encoding. The last 10% of simulations with 10'' input 



pairs have been taken for the performance testing, a) Unconstrained input protocol ( 16 ) 



both for training and for testing b) Unconstrained input protocol (16) for training and 
constrained ( [T7| ) for testing, c) Constrained input protocol ( [17] ) for both training and 
testing. 

gain) is very high. A relative mutual information ofMl%^SO% means that the system 
recovers over 92% of the maximally achievable correlations between the input and the 
output, as shown in fig. |2] 



3.3 Effect of noisy encoding 

In the previous sections we have provided results showing that the proposed comparator 
can achieve a good accuracy despite the fact that a large part of the input is noise. 
In addition, the comparator is also robust against a level of noise in the encoding of 
the inputs. Random noise in the encoding would correspond to the neural populations 
having rapid random reconfigurations or random changes in the individual neurons' 
behavior above a certain level. 

As shown in fig. |4} the system has an accuracy decay if the encoding is affected 
by random noise of the same magnitude as the average input activity (0.5). For this 
calculation, we define the random noise in the encoding as adding a random number 
between and e to each element of one of the compared inputs, i.e. yi yi + ri, where 
ri G [0, e]. The values rj are changed in every step of the calculation. 
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N 


{E) 


(FP) 


(FiV) 


MI% 


5 


9±6 % 


8±7 % 


10±5 % 


58±14% 


15 


3.9±0.4 % 


0.4±0.1 % 


6.9±0.6 % 


78±1 % 


30 


3.4±0.2 % 


0.3±0.1 % 


6.3±0.3 % 


81±1 % 


60 


3.3±0.1 % 


0.2±0.1 % 


6.1 ±0.2 % 


81±1 % 


100 


3.4±0.1 % 


0.2±0.1 % 


6.2±0.1 % 


82±1 % 


200 


4.7±0.1 % 


0.5±0.3 % 


8.2±0.5 % 


75±1 % 


400 


6.2±0.1 % 


0.4±0.1 % 


10.9±0.1 % 


70±1 % 


600 


7.5±0.1 % 


l.liO.l % 


12.4±0.1 % 


66±1 % 



Table 3: Errors obtained from averaging 100 runs of the comparator, using linear en- 
coding, with Peg = 0.5, for different input sizes A^. The connection probabilities used 

tire piJnn = 0.3, PcJnn = 0.8. 




Figure 4: Errors E, FP, FN, and percentage of information gain MI% for different 
levels of noise e in the encoding, e = 0.5 corresponds to a magnitude equal to the 
average activity of an input. 

The addition of random noise in the encoding is effectively seen by the system as 
a slightly different input. Since the system is designed to classify inputs either into 
different or equal, a large level of noise drives the system into classifying the input as 
different. However, if the input is only slightly changed, the correlation is still found by 
the comparator and the output remains under the threshold for classification. 
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3.4 Impact of the frequency of correlated input and input size 



In fig. [5| the dependency of the optimal threshold 9 and the errors E, FP, FN, MI% 
with the probability p^q is shown. At a constant input size, the threshold shows only a 
weak dependence with the probability peq. The threshold changes at its maximum for 
the probability of any case in the order of 10% or less. The threshold varies less than 0.1 
from Peg = 0.1 to Peg = 0.9. TMs indicates that the system would still be effective if the 
probabilities of the events change significantly, even without readjusting the parameters 
a or 9, or with a small readjustment if the change is extreme. 
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Figure 5: On the left, the errors E, FP, FN, and relative mutual information MI% 
for different frequencies of correlated input Peq (percentages). The results are presented 
for input sizes = 20 and 60. On the right, the optimal values of the threshold 9 for 
different frequencies of correlated input p^q. The results are again presented for input 
sizes = 20 and 60. 



In fig. [6| the dependence of the optimal threshold 9 with the input size A^ is pre- 
sented. The threshold has a marked logarithmic dependency with respect to the system 
size. In effect, the threshold 9, the gain a and the system size A^ are all strongly coupled, 
such that given an input size the rest of the parameters are essentially fixed. 



3.5 Comparison of inputs with different sizes 

The comparator successfully compares input of different sizes. In table |4] we show 
the average accuracy over 100 runs of a comparator where one of the vectors to be 
compared has a size A^ and the other has a larger size A^+ A. The number of extra inputs 
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A is maintained constant during the whole simulation. In each step, the values of the 



two vectors are assigned as described previously as "linear encoding" in sec. |2.1[ The 
linear encoding is done in this case with a matrix A that has dimensions (A^ + A) x A^, 
thus the information gets encoded in a vector of higher dimension. 

The accuracy of the comparator does not decrease, but, rather surprisingly, it slightly 
increases. There is no loss in accuracy because the uncorrelated inputs are not min- 
imized to a value close to zero due to the anti-Hebbian adjustment of the synaptic 
weights, as happens only with the correlated input. We attribute the small increase 
in accuracy to the increase of neurons involved in the system. 




20 40 60 80 100 120 140 160 180 200 

N 



Figure 6: Optimal threshold 9 vs input size A^, for a = 2.7. The behavior is markedly 
logarithmic. 







A^ = 


20 






N = 


60 




A 


{E) 


{FP) 


{FN) 


MI% 


(E) 


{FP) 


{FN) 


MI% 





3.4±0.3 


0.3±0.1 


6.3±0.4 


80±1 


3.3±0.1 


0.2±0.1 


6.1±0.2 


81±1 


5 


3.1±0.3 


0.3±0.1 


5.7±0.5 


82±1 


3.4±0.3 


0.3±0.1 


6.3±0.2 


80±1 


10 


2.7±0.2 


0.3±0.1 


4.9±0.4 


84±1 


2.9±0.1 


0.2±0.1 


5.4±0.2 


83±1 


20 


2.2±0.2 


0.3±0.1 


4.0±0.4 


86±1 


2.6±0.1 


0.2±0.1 


4.8±0.1 


85±1 


40 


1.6±0.1 


0.3±0.1 


2.9±0.3 


89±1 


2.2±0.1 


0.2±0.1 


4.0±0.1 


87±1 



Table 4: Average errors {E), {FP), {FN) and relative mutual information MI% ob- 
tained from 100 runs of the comparator for comparing one input of size A^ = 20 
(A^ = 60, right) and another of size A^ + A, using p^q = 50. 
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Figure 7: Effect of the probability of connection in the overall errors. On the left, the 
errors of the comparator with varying probability pionn of connection efferent from the 

/')\ (2) 

first layer, with pconn = 0.75. On the right, the probability pconn of efferent connection 
from the second layer is varied, keeping pi^^nn = 0.3 constant. 

3.6 Influence of connection density 

A key ingredient in this model is the suppression of a fraction of inter-layer connections 
with probability 1 —pconn, which is necessary to give higher-layer neurons the possibility 
to encode varying features of correlated input pairs. For a systematic study we ran 
simulations using a range of distinct probabihties of interconnecting the layers. 

In figure |7} we show the unconstrained performance measures for = 5 when 
changing (left) the connection pmnn from the input layer to the first layer (compare 
fig.[l| with constant pcoLn = 0.75) and (right) when varying the connection p^cJnn from 
the second to the third layer. In the later case we kept pEnn = 0.3 fixed. 

The data presented in fig. |7] show that the neural comparator loses functionality 
when the network becomes fully interconnected. The optimal interconnection density 
varies from layer to layer and is best for 10% efferent first-layer connections and 60% 
links efferent from the second layer. 

3.7 Images Comparison 

We tested the comparator efficiency in comparing a set of black and white pictures of 
small size (20x20 pixels, i.e. = 400) using linear encoding via a random matrix 
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Figure 8: Scheme of two possible encryptions made by the comparator with probability 
Peg. In both cases, after being exposed to many images the comparator would ideally 
result in an output x'' ~ 0. Notice the matrix A in each side of the figure is constant 
during the whole simulation. 

as in previous sections, see fig. [8] The set of pictures is very small (200 pictures) in 
comparison to the input data used to train the comparator (t = 10^ inputs). The results 
can be seen in table |5| The limited input set has the negative effect that the comparator 
is not able to learn comparison only from this set. This suggests that in order for the 
comparator to develop its functionality, it must sample a sizable part of the possible 
input patterns. 





(E) 


(FP) 


(FN) 


MI% 


Only Images p^q = 0.2 


20.6±0.3 


51.9±0.6 


14.4±0.2 


8±1 


Trained w/Random input = 0.2 


14.5±0.2 


39.3±0.3 


6.2±0.1 


32±1 


Trained w/Random input = 0.5 


lO.SiO.l 


10.7±0.2 


10.9±0.1 


51±1 



Table 5: Average errors {E), (FP), {FN) and relative mutual information MI% ob- 
tained from 100 runs of the comparator for comparing black and white pictures of size 
20x20 pixels. 



As explained previously, the correlated inputs are minimized by the anti-Hebbian 
rule, while the uncorrelated input cannot be minimized to the same level, since those 
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cases result in the terms xl Xj in eq. (|5| being essentially random. This assumption 
is however not fulfilled if the values of these terms are not well distributed (unless 
their values are by chance always small), which is the case if the sampling is not large 
enough. 

As a second test, we initially trained the comparator using random data (still using 
Peg = 0.5) in order to start with a functional distribution of the synaptic weights, and 
then switched to the picture set for the last 10% of the calculation, with the comparator 
still learning during this stage. In this case, the comparator achieved its function (see 
table [5]). However, the accuracy did not fully reach that of the system when comparing 
randomly generated data. 

We expect the accuracy of the random comparator to be at the level of the generated 
by random input if the input stream explores a sizable part of the possible input. For 
instance, ideally the image input would be a video of the visual input in a mobile agent 
while exploring the environment, such that a large amount of patterns are processed by 
the comparator. This is however out of the scope for this work, while follow up work is 
expected. 



4 Interpretation within the scope of fuzzy logic 

The dependency of the output of the comparator seen in fig. and fig. |9] can be 



interpreted in terms of fuzzy logic (Keller et al. 1992), offering alternative application 
scenarios for the neural comparator. 

The error measures evaluated in table [2[ like the incidence of false positives (FP), 
are based on boolean logic, the classification is either correct or incorrect, i.e. binary. 
For real- world applications the input pairs (y, z) may be similar but not equal and the 
dependency of the output as a function of input similarity is an important relation char- 
acterizing the functionality of neural comparators. 

The comparator essentially provides a continuous variable classifying how much 
the input case corresponds to the case of equal input, i.e. a truth degree. Thus, the 
comparator can be interpreted as a fuzzy logic gate for the operator "equals" (=), since 
it provides a truth degree for the outcome of the discrete version of the same operator. 

In fig. |9] we present, on a logarithmic scale, the density of results for the observed 
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Figure 9: Density of output-distance pairs from two individual runs of the compara- 
tor, comparing inputs of dimension = 5 under the direct encoding (left) and linear 
encoding (right), using a = 2.7. 

output x^'^\ as a function of the distance d between the respective inputs, for one single 
run of the comparator. 80% of the input vectors were randomly drawn and later read- 
justed in order to fill the range of distances d = 0.1 to 1.5 uniformly, according to the 



constrained protocol ( 17). In addition, 20% of the input have a distance of = with 
z = y, resulting in the high density of simulations at = 0. 

The uncertainty of the classification of inputs presented in fig. |9] is reflected in a 
probability distribution for the comparator output, shown in fig.[10]for the case of direct 
encoding. The output distribution is narrower for cases where the distance d corre- 
sponds to clearly correlated or uncorrelated inputs. 

The distributions presented in fig.|9]can be interpreted as fuzzy probability distribu- 



tions for any given distance d (vertical slices), as shown in fig. 10 The probability for 
the input pairs y and z to be classified as different decreases with decreasing distance 
d between them. This shows that inputs with smaller distances have in general increas- 
ingly weaker outputs. Thus, assuming that the Euclidean distance d is a good estimator 
of how similar the input is, the output of the comparator provides an arguably reliable 
continuous variable estimating a similarity degree for the inputs, i.e. the truth degree of 
the operator "equals" applied to the inputs. 
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Figure 10: Output distribution for different Euclidean distances (15) between the 
inputs, as a function of output, corresponding to a vertical slice of the = 5 direct 
encoding data presented in fig.[9j 

5 Discussion 



The results presented here demonstrate that the proposed neural comparator has the 
capability of discerning similar input vectors from dissimilar ones, even under noisy 
conditions. Using 80% noise, with four out of five inputs being randomly drawn, the 
unsupervised comparator architecture achieves a boolean discrimination accuracy of 
above ~90%. The comparator circuit can also achieve the same accuracy when the 
inputs to be compared are encoded differently. If the encodings of both inputs are 
related by a linear relation, the accuracy of the comparison does not worsen with respect 
to the direct encoding case. 

A key factor for the accuracy of the method is the inclusion of a slightly different 
path for the layer-to-layer information, provided by random suppressions of interlayer 
connections. However, the suppression has the potential side effect of rendering some 
of the correlations difficult to be learned. For this reason a compromise needs to be 
found between the number of connections that must be kept in order to maintain the 
network functional and the number of connections that needs to be removed to generate 
sufficiently different outputs in the third layer. 

We find it remarkable that from a very simple model of interacting neurons under the 
rule of minimization of its output, the fairly complex task of identifying the similarity 
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between unrelated inputs can emerge through self-organization without the need of any 
predefined or externally given information. Complexity arising from simple interactions 
is a characteristic of natural systems, and we believe the capacity of many living beings 
to perform comparison operations could potentially be based on some of the aspects 
included in our model. 



Conclusion 

We have presented a neuronal circuit based on a feed-forward artificial neural network, 
which is able to discriminate whether two inputs are equal or different with high accu- 
racy even under noisy input conditions. 

Our model is an example of how algorithmic functionalities can emerge from the 
interaction of individual neurons under strictly local rules, in our case the minimization 
of the output, without hard-wired encoding of the algorithm, without external super- 
vision and without any a priori information about the objects to be compared. Since 
our model is capable of comparing information in different encodings, it would be a 
suitable model of how seemingly unrelated information coming from different areas of 
a brain can be integrated and compared. 

We view the architecture proposed here as a first step towards an in-depth study of 
the important question: which are possible neural circuits for the unsupervised com- 
parison of unknown objects. Our results show, that anti-Hebbian adaption rules, which 
are optimal for synaptic information transmission ( Bell and SejnowsEl|1995 1, allow to 



compare two novel objects, viz objects never encountered before during training, with 
respect to their similarity. The model is capable not only to provide binary answers - 
whether the two objects in the sensory stream are (are not) identical - but also to give a 
quantitative estimate of the degree of similarity, which may be interpreted in the context 
of fuzzy logic. We believe this quantitative estimate of similarity to be a central aspect 
of any neural comparator, as it may be used as a learning or reenforcement signal. 
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